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IMPROVED SURFACE INSPECTION SYSTEM 
WITH MISREGISTRATION ERROR CORRECTION 
AND ADAPTIVE ILLUMINATION 



5 BACKGROUND OF THE INVENTION 

This invention relates in general to surface 
inspection systems and in particular to an improved 
surface inspection system employing misregistration 
error correction and adaptive illumination. 

10 The improved surface inspection system of this 

application is particularly useful for inspecting 
anomalies of semiconductor wafers, photomasks, reticles, 
ceramic tiles and other surfaces. The size of 
semiconductor devices fabricated on silicon wafers has 

15 been continually reduced. The shrinking of 

semiconductor devices to smaller and smaller sizes has 
imposed a much more stringent requirement on the 
sensitivity of wafer inspection instruments which are 
called upon to detect contaminant particles and pattern 

20 defects that are small compared to the size of the 
semiconductor devices. At the same time, it is 
desirable for wafer inspection systems to provide an 
adequate throughput so that these systems can be used 
for in-line inspection to detect wafer defects. One 
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example is the SURFSCAN® AIT system marketed by Tencor 
Instruments of Milpitas, California. 

An anomaly detection system typically detects light 
scattered, reflected or otherwise modified by small 

5 areas of a sample. The data associated with such small 
areas are processed for anomaly detection. If the data 
for a small area is erroneously associated with a 
different small area, this may cause errors in 
identifying anomalies; such errors are known as 

10 misregistration errors. 

In a laser scanning system such as the SURFSCAN® 
AIT, an acousto-optic deflector (AOD) may be used to 
sweep the laser beam across the surface to be inspected. 
In a system of this type, misregistration errors can 

15 arise for a number of reasons. Thus, jitter in firing 
the start of the sweep signal of the AOD and non- 
linearity in the chirp rate of the AOD can cause 
misregistration errors. While the AOD is used to cause 
the laser beam to sweep in one direction, a mechanical 

20 stage is used to move the wafer surface in a direction 
transverse to the sweep direction of the laser beam. 
Thus, mechanical instability of the mechanical stage or 
other mechanical vibrations in the optical/mechanical 
system can also result in misregistration. Pointing 

25 instability of the laser used for providing the scanning 
beam can also cause misregistration. Misregistratin 
errors can arise in the direction of sweep of the laser 
beam as well as in a direction transverse to the sweep 
direction. Where there are repeating patterns on or in 

30 the sample, sometimes it is advantageous to compare 
corresponding areas of the sample for anomaly detection. 
Misregistration errors will also cause such comparison 
to yield erroneous results • It is, therefore, desirable 
to provide an improved surface inspection system where 

35 misregistration errors due to the above described 
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3 

reasons and other reasons are corrected. In this 
context, correction of misregistration includes reducing 
as well as eliminating errors due to misregistration. 

SUMMARY OF THE INVENTION 
5 This invention is based on the observation that in 

many optical inspection systems, data samples in the 
same neighborhood are correlated so that such 
correlation may be used for reducing or correcting 
misregistration, in a laser scanning system, for 
10 example, the point spread function of the scanning laser 
beam is such that data samples on adjacent scan lines 
are highly correlated and can be taken advantage of in 
misregistration correction. One aspect of the invention 
is directed towards a method for correcting 
15 misregistration errors in a system for detecting 
anomalies in a specimen such as a semiconductor wafer, 
said system illuminating a surface of a specimen at a 
two-dimensional array of locations in rows and columns . 
The method comprises the steps of generating a two- 
20 dimensional array of data samples in rows and columns, 
each data sample representing light modified by the 
specimen at a corresponding location in the two- 
dimensional array of locations of the specimen, and 
defining one-dimensional groups of data samples, each 
25 group of data samples or portion thereof defining a 
vector. The method further comprises, for at least one 
vector, providing a reference vector that is an average 
of selected vectors of data samples; and processing the 
at least one vector and the reference vector to correct 
30 for misregistration. 

Another aspect of the invention is directed towards 
a method for correcting misregistration errors in a 
system for detecting anomalies in a specimen such as a 
semiconductor wafer, said system illuminating a surface 
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of the specimen at a two-dimensional array of locations 
in rows and columns. The method comprises the steps of 
generating a two-dimensional array of data samples in 
rows and columns, each data sample in a row and column 
5 representing light modified by the specimen at a 
corresponding location in the two-dimensional row and 
column of locations. The method further comprises 
comparing the data samples of a target subarray and the 
data samples of each of a plurality of reference 
10 subarrays, or between signals derived therefrom, the 
target and the reference subarrays having the same 
dimensions, one of the reference subarrays being at a 
reference position and the remaining reference subarrays 
being offset from the reference position by at least one 
15 row and/or column of data samples to select a pair of 
offset values; and repositioning the target subarray 
according to the pair of offset values. 

Yet another aspect of the invention is directed 
towards a method for correcting misregistration errors 
20 in a system for detecting anomalies in a specimen such 
as a semiconductor wafer, said system illuminating a 
surface of the specimen at a two-dimensional array of 
locations in rows and columns. The method comprises the 
steps of generating a two-dimensional array of data 
25 samples in rows and columns, each data sample in a row 
and column representing light modified by the specimen 
at a corresponding location in the row and column of 
locations. The method further comprises 

crosscorrelating a target subarray and each of a 
30 plurality of reference subarrays of data samples, or 
signals derived therefrom, to obtain a plurality of sets 
of crosscorrelation values, the target and the reference 
subarrays having the same dimensions, one of the 
reference subarrays being at a reference position and 
35 the remaining reference subarrays being offset from the 
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reference position by at least one row and/or column of 
data samples; and selecting the reference subarray that 
corresponds to a set of crosscorrelation values 
according to a criterion. 
5 One more aspect of the invention is directed 

towards a method for detecting anomalies in the specimen 
such as a semiconductor wafer, said method comprising 
the steps of scanning a light beam across the specimen 
along scan lines and detecting light originating from 
10 the light beam after such light has been modified by the 
specimen. The method further comprises controlling 
intensity of the light beam as a function of reference 
data to correct for variations in the detected light 
caused by optical characteristics of the specimen apart 
15 from the anomalies. 

Yet another aspect of the invention is directed 
towards an apparatus for detecting anomalies in a 
specimen such as a semiconductor wafer, comprising means 
for scanning a light beam across the specimen along scan 
20 lines and a detector device detecting light originating 
from the light beam after such light has been modified 
by the specimen; and means for controlling intensity of 
the light beam as a function of reference data to 
correct for variations in the detected light caused by 
25 optical characteristics of the specimen apart from the 
anomalies . 

Yet one more aspect of the invention is directed 
towards an apparatus for correcting misregistration 
errors in a system for detecting anomalies in a specimen 

30 such as a semiconductor wafer, said system illuminating 
a surface of the specimen at a two-dimensional array of 
locations in rows and columns. The apparatus comprises 
means for generating a two-dimensional array of data 
samples in rows and columns, each data sample 

35 representing light modified by the specimen at a 
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corresponding location in the two-dimensional array of 
locations, and means for defining one-dimensional groups 
of data samples, each group of data samples or a portion 
thereof defining a vector. The apparatus further 

5 comprises means for providing a reference vector that is 
an average of selected vectors of data samples for at 
least one vector in the two-dimensional array; and means 
for processing the at least one vector and the reference 
vector to correct for misregistration. 

10 An additional aspect of the invention is directed 

towards an apparatus for correcting misregistration 
errors in a system for detecting anomalies in a specimen 
such as a semiconductor wafer, said system illuminating 
a surface of the specimen at a two-dimensional array of 

15 locations in rows and columns. The apparatus comprises 
means for generating a two-dimensional array of data 
samples in rows and columns, each data sample in a row 
and column representing light modified by the specimen 
at a corresponding location in the two-dimensional array 

20 of locations. The apparatus further comprises means for 
comparing the data samples of a target subarray and the 
data samples of each of a plurality of reference 
subarrays, or between signals derived therefrom, the 
target and the reference subarrays having the same 

25 dimensions, one of the reference subarrays being at a 
reference position and the remaining reference subarrays 
being offset from the reference position by at least one 
row and/or column of data samples to select a pair of 
offset values; and means for repositioning the target 

30 subarray according to the pair of offset values* 

Another aspect of the invention is directed towards 
an apparatus for correcting misregistration errors in 
the system for detecting anomalies in a specimen such as 
a semiconductor wafer, said system illuminating a 
35 surface of the specimen at a two-dimensional array of 
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locations in rows and columns. The apparatus comprises 
means for generating a two-dimensional array of data 
samples in rows and columns , each data sample in a row 
and column representing light modified by the specimen 

5 at a corresponding location in the two-dimensional array 
of locations. The apparatus further includes means for 
crosscorrelating a target subarray and each of a 
plurality of reference subarrays of data samples, or 
signals derived therefrom, to obtain a plurality of sets 

10 of crosscorrelation values, the target and the reference 
subarrays having the same dimensions, one of the 
reference subarrays being at a reference position and 
the remaining reference subarrays being offset from the 
reference position by at least one row and/or column of 

15 data samples; and means for selecting the reference 
subarray that corresponds to a set of crosscorrelation 
values according to a criterion. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1A is a schematic view of an elliptical-shaped 
20 illuminated area or spot on a surface to be inspected to 
illustrate the invention. 

Fig. IB is a graphical illustration of the 
illumination intensity across the width or short axis of 
the elliptical spot of Fig. 1A for defining a boundary 
25 of the spot and to illustrate the invention and 
illustrating a point spread function of the illumination 
beam. 

Fig. 1C is a schematic view of three positions of 
an illuminated area on a surface to be inspected to 
30 illustrate the scanning and data gathering process of 
the system of this invention. 

Fig. 2 shows partially in perspective and partially 
in block diagram form a system for inspecting anomalies 
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of a semiconductor wafer surface to illustrate the 
preferred embodiment of the invention. 

Fig. 3 is a perspective view showing in more detail 
the illumination and collection features of the system 
5 of Fig. 2. 

Fig, 4 is a schematic diagram of sweep and 
"mechanical 11 scan axes to illustrate the data 
acquisition process of the invention. 

Fig, 5 is a schematic view of a portion of a 
10 patterned wafer surface illustrating the intersection of 
a strip unit with a die grid to illustrate the data 
processing subsystem of this invention. 

Fig. 6 is a schematic view illustrating a portion 
of a patterned wafer surface illustrating a number of 
15 strip units and the anomaly detection and verification 
processes of Figs. 7 and 8. 

Fig. 7 is a system electronics functional block 
diagram of the present system of the invention. 

Fig. 8 is a functional block diagram of the data 
20 processing board portion of the system of Fig. 7 to 
illustrate the preferred embodiment of the invention. 

Fig. 9 is a schematic top view of a semiconductor 
wafer having a two-dimensional array of locations 
relative to a fast scan direction and a direction of 
25 sweep of the AOD and a slow scan direction using a 
mechanical stage to illustrate the invention. 

Fig. 10A is a schematic view of reference and 
current vectors in reference to a coordinate system in 
the alignment board of Fig. 7 to illustrate the 
30 preferred embodiment of the invention. 

Fig. 10B is a schematic view of the reference 
vector of Fig. 10A, and of average and residual 
reference data samples to illustrate the invention. 
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Fig, IOC is a schematic view of reference and 
current vectors to illustrate a process of 
crosscorrelation or residual minimization. 

Fig. 11 is a functional block diagram for image 
5 registration using normalized crosscorrelation of a 
current vector and a reference vector for correcting 
misalignment performed by the alignment board of Fig* 7 
to illustrate one aspect of the invention. 

Fig, 12 is a functional block diagram for 
10 correcting misregistration using normalized residual 
minimization in the alignment board of Fig. 7 to 
illustrate another aspect of the invention. 

Fig. 13 is a schematic diagram to illustrate the 
generation of a local average in a normalization process 
15 of a current vector and a reference vector performed by 
the alignment board of Fig. 7 to illustrate one feature 
of the invention in Fig. 12. 

Fig. 14A is a schematic view of a two-dimensional 
array of data samples in a target subarray taken from 
20 the two-dimensional array in one of the strip units, 
such as strip unit N of Fig. 6. 

Fig. 14B is a schematic view of a two-dimensional 
array of average data samples obtained from the target 
array of Fig. 14A. 
25 Fig, 14C is a schematic view of a two-dimensional 

subarray of data samples taken from a two-dimensional 
array of data samples in a reference strip unit, such as 
strip unit N-l in Fig. 6. 

Fig. 14D is a schematic view of a two-dimensional 
30 array of average data samples obtained from the 
reference array of Fig. 14C. 

Fig. 15 is a functional block diagram for image 
registration using normalized residual minimization of 
a target subarray and a reference subarray for 
35 correcting misalignment performed by the alignment board 
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of Fig. 7 to illustrate one aspect of the invention. 

Figs. 16A and 16B are graphical plots illustrating 
two normalized vectors to illustrate the effects of 
normalization in Figs. 12 , 13 and 15. 
5 Fig. 16C is a graphical plot of the normalized sums 

of residuals versus offset to illustrate the processes 
in Figs. 12 and 15. 

Figs. 16D-16G are computer plots of , respectively, 
a reference image, a target image, a residual image 
10 before alignment and a residual image after alignment 
using the alignment board of Fig. 7 to illustrate the 
invention . 

Fig. 17 is a functional block diagram for image 
registration using normalized crosscorrelation of a 
15 target subarray and a reference subarray for correcting 
misalignment performed by the alignment board of Fig. 7 
to illustrate one aspect of the invention. 

Fig. 18 is a functional block diagram for 
interpolation of detected intensity values to obtain the 
20 array of data samples of Fig. 9. 

Fig. 19 shows partially in perspective and 
partially in block diagram form the surface inspection 
system of Figs. 2 and 3, where the illumination beam is 
modulated to compensate for variations of detected light 
25 intensity caused by the optical characteristics of the 
specimen apart from anomalies. 

Fig. 20 is a functional block diagram for 
correcting image misregistration in the system of Fig. 
19 using adaptive illumination and residual 
30 minimization. 

For simplicity in description, identical components 
are labelled by the same numerals in this application. 
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Pig. 1A is a schematic view of an elliptical-shaped 
illuminated area (or spot) of a surface inspected by the 
system of this invention to illustrate the invention. 
As explained below, the laser beam illuminating the 

5 surface inspected approaches the surface at a grazing 
angle, so that even though the illumination beam has a 
generally circular cross-section, the area illuminated 
is elliptical in shape such as area 10 in Fig. 1A. As 
known to those skilled in the art, in light beams such 

10 as laser beams, the intensity of the light typically 
does not have a flat distribution and does not fall off 
abruptly to zero across the boundary of the spot 
illuminated, such as at boundary 10a of spot 10 of Fig. 
1A. instead, the intensity falls off at the outer edge 

15 of the illuminated spot at a certain inclined slope, so 
that instead of sharp boundaries such as boundary 10a 
illustrated in Fig. 1A, the boundary is typically 
blurred and forms a band of decreasing intensity at 
increasing distance away from the center of the 

20 illuminated area. 

In many lasers, the laser beam produced has a 
Gaussian intensity distribution, such as that shown in 
Fig. IB. Fig. IB is a graphical illustration of the 
spatial distribution of the illumination intensity in 

25 the Y direction of a laser beam that is used in the 
preferred embodiment to illuminate spot 10 of a surface 
to be inspected as shown in Fig. 1A, and thus is also 
the illumination intensity distribution across spot 10 
in the Y direction. As shown in Fig. IB, the 

30 illumination intensity has been normalized so that the 
peak intensity is 1, and the illuminaition intensity has 
a Gaussian distribution in the X direction as well as in 
the Y direction. Points 12 and 14 are at spatial 
locations yl and y5 at which points the illumination 

35 intensity drops to 1/e 2 of the peak intensity, where e 
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is the natural number. The spot 10 is defined by the 
area within a boundary 10a where the illumination is 
1/e 2 of that of the maximum intensity of illumination at 
the center of the spot. The lateral extent of the spot 
5 10 may then be defined to be the boundary 10a. 

To maintain uniform detection sensitivity, the 
scanning light beam is preferably caused to scan short 
sweeps having a spatial span less than the dimension of 
the surface it is scanning, as illustrated in the 
10 preferred embodiment in Fig. 2, where these short sweeps 
are not connected together but are located so that they 
form arrays of sweeps. 

The surface inspection system of this application 
will now be described with reference to Figs. 2 and 3. 
15 As shown in Fig. 2, system 20 includes a laser 22 
providing a laser beam 24. Beam 24 is expanded by beam 
expander 26 and the expanded beam 2 8 is deflected by 
acousto-optic deflector (AOD) 30 into a defected beam 
32. The deflected beam 32 is passed through post-AOD 
20 and polarization selection optics 34 and the resulting 
beam is focused by telecentric scan lens 36 onto a spot 
10 on surface 40 to be inspected, such as that of a 
semiconductor wafer, photomask or ceramic tile, 
patterned or unpatterned. 
25 In order to move the illuminated area that is 

focused onto surface 40 for scanning the entire surface, 
the AOD 30 causes the deflected beam 32 to change in 
direction, thereby causing the illuminated spot 10 on 
surface 40 to be scanned along a sweep 50. As shown in 
30 Fig. 2, sweep 50 is preferably a straight line having a 
length which is smaller than the dimension of surface 40 
along the same direction as the sweep. Even where sweep 
50 is curved, its span is less than the dimension of 
surface 40 along the same general direction. After the 
35 illuminated spot has traversed along sweep 50, surface 
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40 of the wafer is moved by XY stage 124 (Fig. 3) 
parallel to the X axis in Fig 2 so that the illuminated 
area of the surface moves along arrow 52 and AOD 30 
causes the illuminated spot to scan a sweep 53 parallel 
5 to sweep 50 and in an adjacent position spaced apart 
from sweep 50 along the X axis to scan an adjacent sweep 
at a different X position. As described below, this 
small distance is preferably equal to about one quarter 
of the dimension of spot 10 in the X direction. This 
10 process is repeated until the illuminated spot has 
covered strip 54; at this point in time the illuminated 
area is at or close to the edge 54a. At such point, the 
surface 40 is moved by XY stage 124 along the Y 
direction by about the length of sweep 50 in order to 
15 scan and cover an adjacent strip 56, beginning at a 
position at or close to edge 56a. The surface in strip 
56 is then covered by short sweeps such as 50 in a 
similar manner until the other end or edge 56b of strip 
56 is reached at which point surface 40 is again moved 
20 along the X direction for scanning strip 58. This 
process is repeated prior to the scanning of strip 54, 
56, 58 and continues after the scanning of such strips 
until the entire surface 40 is scanned. Surface 40 is 
therefore scanned by scanning a plurality of arrays of 
25 sweeps the totality of which substantially covers the 
entire surface 40. 

The deflection of beam 32 by AOD 30 is controlled 
by chirp generator 80 which generates a chirp signal. 
The chirp signal is amplified by amplifier 82 and 
30 applied to the transducer portion of AOD 30 for 
generating sound waves to cause deflection of beam 32 in 
a manner known to those skilled in the art. For a 
detailed description of the operation of the AOD, see 
" Acoustooptic Scanners and Modulators," by Milton 
35 Gottlieb in Optical Scanning , ed. by Gerald F. Marshall, 
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Dekker 1991, pp. 615-685. Briefly, the sound waves 
generated by the transducer portion of AOD 30 modulate 
the optical refractive index of an acoustooptic crystal 
in a periodic fashion thereby leading to deflection of 
5 beam 32* Chirp generator 80 generates appropriate 
signals so that after being focused by lens 36, the 
deflection of beam 32 causes the focused beam to scan 
along a sweep such as sweep 50 in the manner described. 
Chirp generator 80 is controlled by timing 
10 electronics circuit 84 which in the preferred embodiment 
includes a microprocessor. The microprocessor supplies 
the beginning and end frequencies fl, f2 to the chirp 
generator 80 for generating appropriate chirp signals to 
cause the deflection of beam 32 within a predetermined 
15 range of deflection angles determined by the frequencies 
fl, f2. The illumination sensor optics 90 and adaptive 
illumination control 92 are used to detect and control 
the level of illumination of spot 10. The optics 90 
and adaptive illumination control 92 are explained in 
20 detail below in reference to Figs. 19, 20. 

Detectors such as detectors 110a, 110b , 111a, 111b 
of Figs. 2 and 3 collect light scattered by anomalies as 
well as the surface and other structures thereon along 
sweeps such as sweep 50 and provide output signals to a 
25 processor 130 in order to detect anomalies and analyze 
their characteristics. 

Fig, 3 is a perspective view of system 20 of Fig. 
2 showing in more detail the arrangement of the 
collection/detection channels to illustrate the 
30 preferred embodiment. Surface 40 may be smooth (118) or 
patterned (119). The angle between the incident focused 
beam 38 and the normal direction 150 to the surface 40 
is preferably in the range of about 10-85° and more 
preferably within the range of 50-80°; in Fig. 3, this 
35 angle is labelled 0. The four channels of collection 
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are preferably at elevation angles a that will collect 
scattered light from 3-30° from the plane of surface 40. 

Fig. 1C is a schematic view of three positions of 
the illuminated area on a surface to be inspected to 
5 illustrate the scanning and data gathering process of 
system 20. As shown in Fig. 1C, at one instant in time, 
beam 38 illuminates an area 10 on surface 40. Spot 10 
is divided into sixteen areas by grid lines xl-x5, yl- 
y5, where such areas are referred to below as resolution 
10 elements. In this context, the term "resolution 
element" is defined by reference to the taking of data 
samples across the intensity distributions along the X 
and Y axes, such as that in Fig. 1C, and by reference to 
subsequent data processing. The resolution element that 
15 is bounded by grid lines x2, x3 and y2, y3 is resolution 
element P shown as a shaded area in Fig. 1C. If there 
is an anomaly in this resolution element P, and if the 
light illuminating resolution element P has the 
intensity distribution as shown in Fig. IB with* a high 
20 intensity level between grid lines y2 and y3, light 
scattered by the anomaly will also have a high 
intensity. However, as the beam moves along the Y axis 
so that the area 10 1 is illuminated instead, resolution 
element P will still be illuminated but at the lower 
25 intensity level of that between grid lines yl and y2; in 
reference to Fig. IB, the intensity of the illumination 
is that between grid lines yl and y2 in Fig. IB. 
Therefore, if the sampling rate employed by the data 
processor 130 in Fig. 3 for processing light detected by 
30 the collection or collector channels 110a, 110b, 111a, 
111b is such that a data sample is taken when the 
illuminating beam is in position 10 and when the 
illuminating beam is in position 10', then two data 
samples will be recorded. Thus for any resolution 
35 element such as P, a number of data points will be 
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taken, one when the illumination is at a higher level as 
illustrated by data point D2 in Fig. IB and another one 
when the illumination is at a lower level, illustrated 
at data point Dl in Fig. IB. If position 10 is not the 

5 starting position of the sweep 50 illustrated in Figs. 
3 and 4, then two prior samples would have been taken 
prior to the time when the illuminating beam illuminates 
the surface 40 in position 10, so that the processor 
would have obtained two more data samples at points D3, 

10 D4 corresponding to the prior positions of the 
illuminating beam when light of intensity values between 
grid lines y3, y4 and between y4, y5 respectively 
illuminates such resolution element P (grid lines yl 
through y5 would, of course, move with the location of 

15 the spot). In other words, four separate data samples 
at points D1-D4 would have been taken of the light 
scattered by an anomaly present in resolution element P 
as the illumination beam illuminates resolution element 
P when scanning along the Y direction. 

20 in most laser beams, the beam intensity has a 

Gaussian intensity distribution not only in the Y 
direction but also in the X direction. For this reason, 
after the illuminating beam completes the scanning 
operation for scanning a sweep such as sweep 50 as shown 

25 in Fig. 2, and when the illuminating beam returns to 
position 74 for scanning the adjacent sweep 53 as shown 
in Fig. 2, it is desirable for the illuminated area 
along sweep 53 to overlap that of sweep 50 so that 
multiple samples or data points can again be taken also 

30 along the -X direction as well as along the Y direction. 
Therefore, when the illumination beam is scanning along 
sweep 53 from starting position 74 as shown in Fig. 2, 
the area illuminated would overlap spot 10; this 
overlapping spot is 10/ ' as shown in Fig. 1C, where the 

35 spot 10* ' is displaced along the -X direction relative 
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to spot 10 by one quarter of the long axis of the 
ellipse 10 and 10* ' . 

Point Spread Function 

The Gaussian intensity distribution illustrated in 
5 Fig. IB is that of the combined illumination and light 
collection system 20 of Fig. 2. When system 20 is used 
to scan an object and to collect the light scattered 
thereby, the point spread function obtained has a 
similar shape as that shown in Fig. IB. Fig. IB 
10 therefore also illustrates the point spread function 
obtained when system 20 is used to scan and collect 
light scattered from an object. For this reason, in the 
description above, the curve in Fig. IB is used to 
illustrate the data samples obtained by system 20 when 
15 scanning surface 40. 

As discussed above in reference to Figs. IB, 1C 
above , the point spread function is a function of both 
x and y resolution element positions. In Fig. 2, the 
illumination system illuminates a small area of the 
20 surface 40 and the intensity distribution of such 
illumination essentially determines the point spread 
function of system 20. 

Alternatively, in imaging type systems f such as 
those described in U.S. Patent Nos. 4,532,650; 
25 4,579,455; 4,805,123; and 4,926,489 referenced above, 
the illumination system illuminates a large area of the 
surface, and light from the surface to be inspected is 
focused by a light collection system in a manner similar 
to that of a camera. In such imaging systems, the 
30 design of the light collection system determines the 
point spread function of the combined illumination and 
light collection system. The point spread functions 
obtained using most such imaging systems also have 
Gaussian distributions of the form shown in Fig. IB, and 
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the above description concerning scanning and data 
acquisition can be generally applied to these imaging 
systems. Therefore, in both types of systems, when the 
intensity of the detected light is sampled, and when 

5 such sampled data is aligned prior to the detection and 
verification of anomalies, the characteristics of such 
point spread functions are taken into account as 
described in detail below. In other words, the data 
processing scheme described below, including the 

10 alignment to correct for misregistration process 
described below, is applicable to both types of systems. 

Terminology and Setting for Data Processing 

Anomalies are identified by comparing intensity 
levels or values from corresponding resolution element 

15 locations of images of adjacent instances of a repeating 
pattern — referred to as a strip unit (described below 
in detail) — on the wafer surface, where the repeating 
pattern may be a real one such as that of memory or 
logic devices on a semiconductor wafer ♦ in the case of 

20 a non-patterned wafer , it is sometimes useful to define 
an imaginary repeating pattern; both types of patterns 
are referred to herein as repeating patterns, and strip 
units can be defined with respect to such repeating 
patterns. scattered light data samples from 

25 corresponding locations of adjacent strip units are 
buffered and compared, where the data samples in which 
anomalies are being sought is referred to as the target 
array and the collection of reference data samples 
(against which the target array is compared) is referred 

30 to as the reference array. 

The data processing portion of the system of this 
invention employs resolution elements of the order of 
several microns in size, which is considerably larger 
than the submicron resolution element size of the 
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imaging type systems described above. Furthermore, in 
the preferred embodiment, by monitoring variations in 
the height of the surface to be inspected and correcting 
for such variations using an automatic positioning 
5 system such as that in U.S. Patent No. 5,530,550, it is 
possible to achieve local data registration of better 
than ±1 resolution element in accuracy. These factors 
are exploited in the data processing design to enable 
less expensive and more efficient detection and 
10 verification of anomalies. 

This invention employs strip-unit-to-strip-unit 
comparison (defined below) which assumes that the 
pattern on the wafer repeats in a periodic fashion. The 
fundamental repeating pattern unit on the wafer is a die 
15 and the two-dimensional spatial layout of die is called 
a die grid as shown in Fig. 5. In some instances, the 
fundamental repeating pattern may actually be a stepper 
field comprised of several die. 

The data processing portion of the system is 
20 defined relative to the current strip scan (as shown in 
Fig. 4). The data processing coordinate system is 
defined by the scan axis X and sweep axes Y defined 
above in reference to Fig. 2 and has units of sweeps of 
the scan axis X and units of resolution elements along 
25 the sweep axis Y. The fundamental repeating pattern 
unit along a strip scan is denoted a strip unit. As 
shown in Fig. 6, the height h of a strip unit is defined 
as the height (length) of a sweep resolution element and 
the width d of a strip unit is defined as the width of 
30 a die. A strip unit may be comprised of several die 
and/or die segments as shown in Fig. 5 where a die 
segment is a portion of a die contained within a strip 
unit. Therefore , the height of a strip unit may contain 
exactly one dice, a segment of a dice, segments of two 
35 die, several whole die, or several whole die bounded by 
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segments of one or two die. In the case of a non- 
patterned wafer , it is sometimes useful to define a 
virtual die and a corresponding virtual die grid, so 
that the definition of the strip unit above applies to 

5 non-patterned wafers as well. Seven strip units are 
shown in Fig. 6, where four of the units are labelled N- 
2, N-l, N, and N+l. 

It is preferable for the illumination beam to be 
aligned with the real or virtual patterns on the surface 

10 inspected (e.g. with the "streets" between die 
patterns), so that the intensity values obtained by the 
light collection subsystem from adjacent patterns can be 
used for indicating anomalies . Such alignment is known 
to those skilled in the art and is discussed, for 

15 example, in U.S. Patent No. 4,898,471 to Stones trom et. 
al. 

Data Processing Subsystem 

Fig. 7 is a functional block diagram of the data 
processing subsystem 130 to illustrate the invention. 
During the scan of a surface, sweep generation by chirp 
generator 80 and data acquisition in the manner 
described above in reference to Figs. 1A-6 are 
synchronized with data processing by timing signals that 
are generated by timing electronics 84. The timing 
signals from board 84 are, in turn, synchronized to the 
X-stage encoder 135, Therefore, as the illuminating 
beam sweeps the wafer surface, the collected light 
signal is digitized by the analog board 134 and passed 
to the data processing board 136 through the alignment 
board 138 for processing. In order to process the 
collected light signal from four collection channels 
independently, each channel has its own analog board and 
data processing board, where all channels derive their 
timing information from a common timing electronics 
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board 84. The timing electronics board 84 controls, 
among other functions, the inter-sweep distance, strip- 
unit width, and the number of strip units that will be 
acquired. An automatic positioning system (not shown in 
Figs. 2, 3) with electronics 99 such as that in U.S. 
Patent No. 5,530,550 is also employed. 

Data Processing Board 136 

Fig. 8 is a block diagram showing in more detail 
the data processing board 136 of Fig. 7. Analog board 
134 converts the intensity data from one of the 
collectors 110a, 110b, 111a, 111b to digitized data. 
The incoming digitized data from the analog and 
alignment boards is first stored in a data buffer 142 of 
a memory management unit 140 after a certain delay by 
delay element 144. The delay is to allow time for the 
board 136 to process the data stored in the buffer 142 
before it is over-written by new incoming data. Memory 
management unit 140 supplies the buffered and incoming 
data to four parallel paths: two detection stages 152, 
154 and two verification stages 162, 164. 

In the preferred embodiment, the detection and 
verification stages are stream-based processing stages 
that process data at a rate substantially equal to the 
average sampling rate of the system. The sampling rate 
is in turn synchronized with the scan rate of the 
surface as noted above. 

In reference to Figs. 6 and 8, the sampling data 
acquired for strip unit N-l is compared to the sampling 
data acquired during a number of previous sweeps for the 
strip unit N-2 and is also compared to the intensity 
data that will be acquired and forthcoming from memory 
management unit 140 for the strip unit N. In such 
comparison, the sampling data for the strip units N-2 
and N are taken as the reference images for comparison 
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with the target image in strip unit N-l. Where strip 
unit N is the target image, then units N-l and N+l are 
the respective reference images for comparison. 
Therefore, for each pair of strip units N-l, N, two 

5 comparisons are performed: one where N-l is the target 
image and N is the reference image, and the other where 
N is the target image and N-l is the reference image. 
After further processing by anomaly detection logic 172, 
174 and the event processing stage 180, the results of 

10 the comparisons are supplied to the system Central 
Processing Unit (CPU) 131 of Fig. 7 to be combined with 
other comparisons for anomaly detection and 
verification. For example, the further processed result 
of the comparison where N is the target image and N-l is 

15 the reference image is supplied to the system CPU. At 
a later point in time, the intensity data for the strip 
unit N+l is acquired and received by board 136 and is 
then used as a reference image to be compared to the 
target image in strip unit N. The further processed 

20 result of such comparison is also supplied to the system 
CPU to be combined with the result of comparison where 
N is the target image and N-l is the reference image for 
anomaly detection and verification. 

The outputs of the detection and verification 

25 stages 152, 162 are supplied to anomaly detection logic 
172 for determining the presence of anomalies in strip 
unit N. In the same vein, detection and verification 
stages 154, 164 supply their outputs to anomaly 
detection logic 174 for determining the presence of 

30 anomalies in strip unit (N-l). The outputs of the 
detection logic 172, 174 are then processed by an event 
processing stage 180 which detects and characterizes 
events, where an event is defined as a connected region 
of (verified) anomalous resolution elements. The event 

35 processing stage 180 buffers events data and signals the 
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availability of event data to the system CPU 131 through 
a download FIFO 182 ♦ 

In the preferred embodiment, a resolution element 
is considered anomalous if the difference between its 
5 sampling data value (i.e. intensity value) and that of 
the corresponding resolution element in the neighboring 
strip unit exceeds the expected variation due to system 
errors • A verification stage verifies the anomalous 
resolution elements in order to remove false-positives. 

10 in the detection and verification processes described 
above, proper registration of data samples to the 
corresponding locations on the wafer surface is 
important. in the strip unit to the strip unit 
comparison employed in the detection verification 

15 processes , it is assumed that the delay introduced by 
delay element 144 would cause the resolution elements in 
a strip unit to be compared to those in an adjacent 
strip unit, such as strip unit N to strip unit N-l. The 
detection and verification processes described above 

20 compare the data samples acquired for the strip unit N 
to those acquired for strip unit N-l. If the data 
samples of either one or both strip units are misaligned 
or misregistered with respect to the resolution units in 
such strip unit, this will introduce an error in the 

25 comparison process* As noted above, by means of the 
automatic positioning system in U.S. Patent No. 
5,530,550 as well as other design features, it is 
possible to achieve local data registration of better 
than +1 resolution element in accuracy. With the 

30 further shrinking of semiconductor devices to smaller 
sizes, it would be desirable to be able to achieve 
better data registration accuracy. The invention of 
this application achieves such goal. 

The above detailed description in reference to 

35 Figs. 1-8 is largely taken from the parent application. 
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Fig. 9 is a schematic top view of a semiconductor 
wafer having a two-dimensional array of locations 
relative to fast scan direction (direction of speed of 
the A0D) and a slow scan direction to illustrate the 
5 invention. As described above in reference to Fig. 1C, 
data samples collected correspond to the resolution 
elements such as resolution element P. For simplicity, 
such resolution elements are represented as dots 202 on 
the wafer 40. Alternatively, the dots 202 may be 
10 thought of as the center points of the resolution 
element, such as center point P c of resolution element 
P in Fig. 1C, where the center point is at the same 
distance from opposing sides of the rectangle of the 
corresponding resolution element. For simplicity, each 
15 resolution element or its center point is referred to as 
a location herein. 

Therefore, the locations 202 on wafer 40 form a 
two-dimensional array of rows and columns, such as row 
2 04 and column 206. In the preferred embodiment, the 
20 rows of locations are along the fast scan direction of 
the sweep, that is, along the Y axis, while the columns 
and the slow scan direction are along the X axis. 

As described above, a data sample is acquired for 
each location on wafer 40 so that the data samples 
25 acquired corresponding to the locations also form a two- 
dimensional array of rows and columns, in the preferred 
embodiment, each row or a portion thereof forms a 
vector; it being understood that any one dimensional 
group of data samples may be defined as a vector, such 
30 as those in a column (e.g. column 206), or ones in a 
diagonal direction. Furthermore, the data samples in a 
group need not be all contiguous to one another; in 
other words, it is possible to exclude one or more data 
samples from a row, column or diagonal array of data 
35 samples from a group if desired. Thus, misregistration 
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of the data samples with respect to the corresponding 
locations on the wafer would cause errors in anomaly 
detection and verification. One of the goals of this 
invention is to reduce such misregistration errors by 
5 misregistration correction. Since the point spread 
functions obtained using most illumination and detection 
optical systems are smooth and symmetrical , the data 
samples acquired from neighboring locations are usually 
highly correlated and this fact can be exploited in 
10 misregistration correction. 

One aspect of the invention is based on the 
observation that, for each vector, a reference vector 
may be formed by averaging vectors adjacent to the 
current vector being processed and then adjusting the 
15 locations of the data samples in the current vector by 
comparing the current vector to the reference vector in 
order to correct misregistration errors. This is 
illustrated in Fig. 10A. 

Fig. 1 OA is a schematic view of a current vector at 
20 the ith row and eight adjacent vectors of the current 
vector in reference to a coordinate system to illustrate 
the above described aspect of the invention involving a 
reference vector. Each horizontal line in rows (i-4) 
through (i+4) represents a vector (i.e. data samples 
25 along the row) . Each vector may comprise the data 
samples acquired in a sweep as shown in Figs. 2 and 4 or 
a portion of such data samples. This invention is 
equally applicable to inspection systems where the beam 
for illumination scans across the entire wafer surface 
30 in one scan, so that a vector may comprise data samples 
corresponding to locations across the entire wafer 
surface. 

In order to increase the accuracy of anomaly 
detection and verification, it is desirable to first 
35 pre-process the data samples acquired to correct for 
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misregistration errors and then supply the data samples 
after such correction to the detection and verification 
stages 152 f 154, 162, 164 in Fig. 8 for anomaly 
detection and verification. The vectors in Fig. 10A 
5 represent data samples acquired after digitization, 
prior to misregistration correction and before they are 
sent to the detection and verification stages. The 
current vector at the ith row has a number of adjacent 
vectors as shown in Fig. 10A. As illustrated by the 

10 point spread function 210 obtained using an illumination 
and collection system such as system 20, the adjacent 
vectors at the (i-l)th through (i-3)th rows and the 
adjacent vectors at the (i+l)th through (i+3)th rows are 
highly correlated with the current vector at the ith row 

15 and this fact can be exploited for misregistration 
correction. A reference vector may be formed by taking 
the average of the eight rows, (i-3)th though (i+4)th. 
Thus, the data sample at the ith row and jth column of 
the reference vector for the above described current 

20 vector on the ith row may be formed by taking an average 
of the data samples at the jth column in the eight 
vectors at the (i-3)th through (i+4)th rows. 
Preferably, a weighted average of the data samples are 
taken, where the weighting coefficients preferably vary 

25 with the point spread function 210 obtained using the 
optical illumination/collection system, in the 
preferred embodiment, the reference vector G is given 
by: 
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(1) 

H/2- 1 

G..= 2 DY,. 

" q -H/2 q ^ 

where Y u is the data sample value at the ith row, jth 
column of the two-dimensional array of data samples; D q 
5 are weighting coefficients; His the number of selected 
vectors for generation of the reference vector; G u f is 
the value of the reference vector at the ith row and jth 
column of the two-dimensional array; where j ranges from 
N/2 to -N/2 so that the current vector being processed, 
10 the reference vector and selected vectors each has N 
data samples. Preferably N ranges from 512 to 8192. 
Thus, in the example above, H is 8 . 

By comparing the current vector with the reference 
vector so formed, it is possible to correct for 
15 misregistration errors. In the above described scheme, 
however, by taking an average over vectors at different 
rows to form the reference vector, one would correct for 
misregistration errors that are in the slow scan 
direction X but not misregistration errors in the fast 
20 scan direction Y. To correct for misregistration errors 
in the Y direction also, in the preferred embodiment, 
each of the data samples in the current vector is 
compared, not just with the corresponding data sample of 
the reference vector in the same column, but also with 
25 the data samples in the reference vector in adjacent 
columns. In this comparison process, the best fit is 
found. If the best fit calls for repositioning the data 
sample to a different column, such data sample in the 
current vector is then repositioned to correct for 
30 misregistration errors. This comparison process may be 
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implemented through normalized crosscorrelation or 
normalized residual minimization described below. 

Another factor involved in misregistration 
correction is that the background reflectivity of the 

5 surface independent of any anomaly may vary across the 
surface due to effects such as process variations, and 
such variation may interfere with the above-described 
process for misregistration correction. Therefore, to 
reduce the effects of such local variations, in the 

10 preferred embodiment, for each data sample in the 
current vector being processed, an average value for the 
data samples in the current vector and an average value 
for the data samples in a reference vectors are 
computed. Preferably, the data samples in the current 

15 vector and in the reference vector from which the 
averages are computed are centered at the data sample in 
the current vector being processed. The difference 
between each of the data samples in the current and 
reference vectors and their respective average values 

20 then define respectively a residual current vector and 
a residual reference vector. This averaging process is 
illustrated in Fig. 10B described below. 

Fig. 10B is a schematic view of data samples in a 
reference vector to illustrate the process for 

25 generating a local average vector and residual reference 
vector from raw data. The dots in Fig. 10B refer to 
locations, or center points (e.g. P c in Fig. 1C) of 
resolution elements, and the values such as G(l,5) refer 
to the data samples detected at such resolution elements 

30 or locations. In order to generate the local average, 
the user needs to decide a window for generating the 
average value. The size of the window should be such 
that by averaging over the window, variations in the 
reflected intensity due to different reflectivities 
35 across the surface are reduced or eliminated. Thus, for 
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example in Fig. 10B, the local average value q 5 for the 
data sample at the ith row and the fifth column may be 
obtained by obtaining an average of eight data samples 
from the data sample G u in the first column through the 
5 data sample G i8 in the eighth column, all on the ith 
row, so that the averaging window A is eight • The above 
described averaging process is illustrated by the 
equation ( 2 ) be low : 

(2) 



1 



H-i-l 

2 



G(i ' 5) "i S 8 G(i,n+5) 



10 (3) 



In order to reduce the effects of the variations of 
reflectivity of the surface, the above average value is 
then subtracted from the data sample to obtain a 
residual data sample G^ 5) at the ith row and fifth 

15 column in the equation (3) above. This process is 
repeated for other locations and the resulting vector is 
referred to as a residual vector. 

The above described process may be performed for 
each data sample in each reference vector as well as in 

20 each current vector to yield residual current and 
reference vectors. The processes of crosscorrelation 
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and residual minimization will now be illustrated in 
reference to Fig, 10C. 

Fig. 10C shows a residual reference vector and a 
residual current vector for illustrating the 
5 crosscorrelation and residual minimization processes of 
this invention. Both the residual reference vector G R{1) 
and the residual current vector Y R(i) on the ith row are 
shown in Fig. 10C. When the data samples in the 
residual current vector Y R(1) are compared to those in 

10 the residual reference vector G R(i)f one possible 
comparison is to compare each data sample at location 
(i,j) of Y R(1) to the data sample at the same location 
(i,j) in G R(1) , If the data samples in the residual 
current vector are misregistered in the ±Y direction, 

15 however, there is likely to be a discrepancy between the 
data sample in the residual current vector at location 
(ifj) and the data sample in the residual reference 
vector at that location. This discrepancy can be 
detected by comparing data samples in the residual 

20 current vector also to data samples that are offset in 
the ±Y direction from such data samples in the residual 
reference vector. 

Thus, if the data sample at the resolution element 
location (i, 16) of the residual current vector is being 

25 processed, such data sample is compared not only to the 
data sample at the same location in the residual 
reference vector, but also to data samples at 
neighboring locations such as (i, 15), (i, 14) and (i, 
17). in order to determine whether the data sample in 

30 the residual current vector at (i, 16) is to be 
repositioned, not only is the data sample at such 
location in the current vector compared to data samples 
in the residual reference vector, but preferably also 
data samples in the neighboring locations such as (i, 

35 14), (i, 15) and (i, 17) in the current vector are also 
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compared to data samples that are offset by the same 
number of locations in the residual reference vector to 
exploit the high correlation between adjacent data 
samples. From this comparison process, it is then 
5 determined whether the data sample in the residual 
current vector at (i, 16) is misregistered and whether 
it should be repositioned. The comparison can entail a 
crosscorrelation or residual minimization process, or 
other similar or equivalent processes, 
10 The preferred embodiment employs crosscorrelation. 

In one example illustrated in Fig. 10C, in order to 
determine whether the data sample at (i, 16) should be 
repositioned, four data samples at locations (i, 14) 
through (i, 17) in the residual current vector are 
15 crosscorrelated with a plurality of sets of four data 
samples in the residual reference vector to yield a 
plurality of crosscorrelation coefficients, each set 
having a different offset m in the ±Y direction relative 
to the data samples of the residual current vector. 
20 Thus, where the offset variable m is -2 as shown in Fig. 
10C, the set of four data samples in the residual 
current vector at locations (i, 14) through (i, 17) are 
crosscorrelated with the four data samples in the 
residual reference vector at locations (i, 12) through 
25 (i, 15) to yield a crosscorrelation coefficient. This 
is performed by multiplying corresponding pairs of data 
samples connected by arrows 220, and summing the four 
products to obtain the sum G R(i , i 2 )Yru, n> + G R(i , i3)Y R(1 , 15) 
+ G R(l , 14) Y R(1 , l6) + G R|i , 15) Y R(1 , l7| . This sum of the four 
30 products is divided by the product of the square root of 
the sum of squares of G R( i f n )9 G R(i , i 3) , G R(1 , i 4) , G R( i r i 5J 
with the square root of the sum of squares of Y R(i , 14) , 
Y R(1 , is>, Y R(i , 16) , Y R(i , 17) . This expression is given in the 
equation below: 
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(4) 
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In equation (4) above, the crosscorrelation coefficient 
C f 16 (-2) xs for the data sample at location (i, 16) for 
the offset m of -2. 
5 The crosscorrelation process is illustrated by the 

four arrows 220 in Fig. IOC, If the offset is -1 
instead, then the four data samples in the residual 
current vector at locations (i, 14) through (i, 17) will 
be crosscorrelated with the data samples at locations 
10 (i, 13) through (i, 16) in the residual reference vector 
as illustrated by arrows 222 in dotted lines. Of course 
these same four data samples in the residual current 
vector may be crosscorrelated with the data samples at 
the same four locations (i, 14) through (i, 17) of the 
15 residual reference vector where there is then no offset 
(m=0) between the two sets of data samples. 

The above process may be repeated with different 
offsets to the extent necessary or desired to cover the 
expected extent of misregistration errors. in other 
20 words, if the data sample at (i, 16) is not expected to 
be misregistered by more than two resolution elements in 
the -Y and +Y directions, then there is no need to 
perform the crosscorrelation process at offsets more 
than two resolution elements from (i, 16). There is 
25 then no need for the offset m to be less than -2 or 
greater than +2, so that a total of five 
crosscorrelation processes with five different offset 
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values would be adequate. If, for example, the five 
crosscorrelation processes are carried out with the 
offset m being equal to -2, -1, 0, +1, +2 # then the 
entire process would result in five crosscorrelation 

5 coefficients. The highest one of the five 

crosscorrelation coefficients corresponding to an offset 
m would indicate that there is maximum correlation 
between the residual reference vector with such offset 
and the residual current vector. 

10 in the example of Fig. 10C, if it turns out that 

the crosscorrelation coefficient where the offset m has 
the value -2 is the maximum of the five, this means that 
the data sample at location (i, 16) has been 
misregistered by 2 resolution elements in the -Y 

15 direction so that in order to correct for 
misregistration error, such data sample should be 
repositioned at location (i, 14). 

The crosscorrelation window W is chosen based on an 
estimate on the extent of correlation of the data 

20 samples in the neighborhood of the location (i, 16). 
The number illustrated in Fig. 10C, namely four, 
corresponds to the number of resolution elements along 
the Y direction in the illuminated spot 10 in Fig. 1C. 
Obvious ly, a number greater than four or less than four 

25 may be chosen to be the crosscorrelation window, and the 
number can be even or odd. All such variations are 
within the scope of the invention. 

Therefore, in general, the crosscorrelation 
coefficient for the data sample at the ith row and jth 

30 column at offset m may be obtained by the equations 
below: 
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(5) 



, A/2-1 



(6) 



i A/2-l 

G. = I E G 



p = -A/2 



where A is a size of a window within which averaging is 
performed; and Y t) , q ; are the averaged values of Y q , 
5 within the window of width A; 



(7) 



W/2-1 



\ 



W/2-l 

S (G 



W/2-1 _ ^ 

l=-W/2 V 7 



where C kj (m)i& a crosscorrelation coefficient at the ith 
row and jth column when the reference vector is in the 
same row but offset by m data samples relative to the at 
10 least one vector; and 

1^ is a size of a correlation window for the 
crosscorrelation coefficient. 

The number of data samples in each vector may be in 
the range of 512 to 8192. The crosscorrelation window 
15 W may be in the range of about 4 to 64. 
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This process is then repeated for each data sample 
in the current vector until all of the data samples in 
the current vector have been re-positioned where 
necessary to correct for misregistration errors • 
5 Fig. 11 is a block diagram illustrating how the 

crosscorrelation process may be implemented. The 
functions in Fig. 11 may be implemented in hardware or 
as processing steps in a microprocessor, on the 
alignment board 138 in Fig. 7. As shown in Fig. 11 , the 

10 scattered intensities from the locations 202 of Fig. 9 
detected by the detectors 110a, 110b, 111a, 111b of Fig. 
3 are digitized by the analog board 134 of Fig. 7 and 
supplied to the alignment board 138. Such data are then 
fed to two parallel path inputs 252, 254 of Fig. 11 as 

15 the inputs for the current vector and reference vector. 
The input digital data are stored in corresponding data 
buffers 256, 258 in the two paths. 

For each data sample Y Kj (at the resolution element 
location (Uj) of the current vector) input to the current 

20 vector data input 252, the calculation below is 
performed to determine whether such sample needs to be 
repositioned in order to correct for misregistration 
errors relative to such resolution element. The running 
average of data samples in the row of the vector is 

25 computed for both the current vector and the reference 
vector by corresponding generators 262, 264 in 
accordance with equations (5) and (6) above over an 
averaging window A in the manner described above in 
reference to Fig. 10B. These averages are computed for 

30 each data sample Y }4 , Gj^ . The running average for such 
data sample in the current vector is then supplied to 
block 266 which computes residual data samples by 
subtracting from each data sample in the current vector 
the running average value computed by block 262 to 

35 arrive at a residual current vector. Block 268 performs 
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a similar operation for deriving a residual reference 
vector. 

The residual current vector and the residual 
reference vector are delayed respectively by delays 272, 
5 2 74. As described above in reference to Fig. IOC, in 
computing the crosscorrelation coefficient, each 
residual data sample in the residual current vector is 
multiplied by a corresponding residual data sample in 
the residual reference vector. If there is no offset 

10 between such two residual data samples that are 
multiplied, then the delays introduced by blocks 272, 
274 may be substantially the same. In such event, the 
outputs of blocks 272, 274 would supply residual data 
samples from the residual current vector and residual 

15 reference vector obtained for the same resolution 
element location to multiplier 286 for deriving the 
product of the two residual data samples. The product 
is then stored in block 286 for addition to other 
similar products. If, however, there is to be an offset 

20 between the two residual data samples (from blocks 266, 
268) of the residual current and reference vectors that 
are multiplied, the two blocks 272, 274 will introduce 
different delays so that residual data samples with the 
appropriate offset there between from the two residual 

25 vectors are multiplied by multiplier 286. 

For achieving a negative offset (m being negative), 
block 274 introduces a delay greater than that 
introduced by block 272; for achieving a positive offset 
(m being positive), block 272 introduces a delay greater 

30 than that introduced by block 274. Both positive and 
negative offsets may be implemented by a fixed delay in 
block 274 and a variable delay in block 272. The 
products of corresponding pairs of data samples in the 
two residual vectors are then added in block 286 over 

35 the crosscorrelation window W as described above to 
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arrive at a value for the numerator in equation 7 for 
the crosscorrelation coefficient for a particular value 
of the offset m. 

Blocks 276, 278 generate respectively the squares 

5 of the residual data samples that are in the window W in 
the residual current vector and the squares of the 
residual data samples that are in the window W in the 
residual reference vector respectively. Blocks 282, 284 
compute respectively, the square root of the sum of the 

10 squares from blocks 276, 278 respectively. The two 
square roots are multiplied by multiplier 288 to arrive 
at the denominator in equation 7, The numerator and 
denominator are then divided by divider 290 to provide 
a value for the crosscorrelation coefficient at offset 

15 m. 

The above described process is repeated for 
different values of offset m to provide a 
crosscorrelation coefficient corresponding to each of a 
plurality of different values of the offset m and block 
20 292 determines, from this plurality of crosscorrelation 
coefficients, the one value (nu) that is maximum of m. 
This value nu of m corresponding to such 
crosscorrelation coefficient is then determined by block 
2 94 and this controls the variable delay block 296 to 

25 introduce the appropriate amount of delay so that the 
data sample in the current vector from input 252 will be 
repositioned, if necessary, from location (i, j) to 
(i, j+ninJ and this data sample is then stored in a 
register. After being delayed by a fixed delay 298, the 

30 reference vector data is stored in a register also. 

Fig. 12 is a block diagram illustrating a system 
for correcting misregistration to illustrate an 
alternative embodiment of the invention employing 
residual minimization. Instead of or in addition to the 

35 crosscorrelation process described above in reference to 
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Fig. 11, misregistration errors can also be corrected by 
means of a residual minimization process illustrated in 
Fig. 12. The residual minimization process of Fig. 12 
again may be implemented in hardware or by processing 
5 steps of a microprocessor. In the embodiment of Fig. 
11, the effects of different reflectivities across the 
surface is reduced by deriving an average for the 
reference vector and the current vector and subtracting 
such average from each data sample to obtain a residual 
10 reference vector or residual current vector. In the 
embodiment of Fig. 12, in contrast, such effects are 
reduced by generating a local average value of the data 
samples surrounding a data sample in a reference vector 
or current vector and dividing each data sample or a 
15 signal derived therefrom in the current vector or 
reference vector by such average to derive a normalized 
reference vector or current vector. 

Thus as shown in Fig. 12, a stream of digitized 
data samples from the analog board 134 is supplied to 
20 inputs 302, 304 and buffers 306, 308. Block 312 then 
generates a local average for each data sample in the 
current vector from data samples in the vicinity of such 
data sample. Preferably, the data samples from which 
the average is derived surrounds such data sample and 
25 enough neighboring data samples are included in the 
averaging process to reduce errors caused by variations 
in reflectivities across the surface. The data sample 
in the current vector or a signal derived therefrom is 
then divided by such average to yield a normalized data 
30 sample. This process is performed for all the data 
samples in the current vector so that a normalized 
current vector is supplied by block 316 to delay 322. 

Blocks 314, 318 perform functions similar to those 
of blocks 312, 316 so that block 318 supplies a 
35 normalized reference vector to delay 324. Delays 322, 



WO 98/45685 



PCT/US98/06929 



39 

324 function in substantially the same manner as blocks 
272, 274 of Fig. 11 to introduce no relative delay or a 
desired relative delay there between and therefore no 
offset or a desired data sample offset between the 
5 normalized current vector and the normalized reference 
vector. Each normalized data sample within a window 
(similar to the window W in the embodiment of Fig. 11 
and equation 7 above) in the normalized reference vector 
is subtracted from a corresponding normalized data 

10 sample in a window in the normalized current vector by 
subtractor 326 to arrive at a difference value for each 
data sample within the window in the normalized current 
vector. The absolute values of such differences are 
added together by block 328 as a residual value for the 

15 current vector and this residual value is supplied to 
block 330. 

The above described process is repeated for 
different values of the offset between the normalized 
reference and normalized current vectors in order to 

20 generate a number of different residual values, each 
corresponding to a different offset in a manner similar 
to that described above in reference to Fig. 11. Block 
330 detects which of these residual values is the 
minimum and the offset value corresponding to it. This 

25 offset value is then supplied to variable delay 332 for 
delaying the normalized data sample of the current 
vector so as to reposition the normalized data sample in 
order to correct for misregistration errors. This 
process is then repeated for some or all of the data 

30 samples in the current vector. 

Fig. 13 is a schematic view illustrating a 
normalization process, such as the one described above 
in reference to Fig. 12. As shown in Fig. 13, in order 
to normalize the data sample at the resolution element 

35 location on the ith row and jth column, it may be 
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desirable to obtain an average of 16 data samples with 
locations (marked by X in Fig. 13) ranging from (j-2) to 
(j + 1) in the Y direction and (i-2) to (i+1) in the X 
direction. For this purpose, blocks 312 , 314 preferably 

5 generate average values by adding the 16 data samples 
and dividing the sum by 16 and provide such average 
value to block 316 or 318. Divider blocks 316, 318 
preferably subtract from the sum a fraction of the 
average data sample value, and divide the difference by 

10 the average to obtain a normalized data sample value. 
Two equations similar in form to the one just described 
are equations (9) and (10) shown below. 

Strip Unit To Strip Unit Comparison 

In lieu of, or in addition to, the above described 

15 embodiments for misregistration correction using a 
reference vector, another scheme for misregistration 
correction makes use of the fact that there may be 
repeating patterns on the surface to be inspected so 
that these repeating patterns can be exploited for 

20 aligning the data samples with locations on the surface 
to correct misregistration errors, in a scheme analogous 
to the strip unit comparison described above in 
reference to Figs. 6 and 8. Thus, if strip unit N is 
thought to contain the same pattern as strip unit N-l, 

25 the data samples in strip unit N may be used as a 
reference and compared to the data samples in strip unit 
N-l for detecting misregistration errors. Therefore, 
instead of supplying digitized data from the analog 
board 134 from essentially the same locale on the 

30 surface to be inspected as in th schemes of Figs. 11 and 
12, digitized data samples collected from different 
locales thought to contain the same pattern may be used 
for comparison, the data samples from one locale 
referred to as a reference array and the data samples 
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from the other locale referred to as a target array. 

Figs. 14A-14D and 15 illustrate a normalization and 
residual minimization process for misregistration 
correction by comparing the target and reference arrays. 

5 Fig. 14A is a schematic view of a two-dimensional array 
of data samples and a target subarray 340 taken from the 
two-dimensional array in one of the strip units , such as 
strip unit N of Fig. 6. Fig. 14C is a schematic view of 
a two-dimensional subarray 342 of data samples taken 

10 from a two-dimensional array of data samples in a 
reference strip unit, such as strip unit N-l in Fig. 6. 
From each of the two subarrays in Figs. 14A and 14C, a 
local average data sample value may be derived at each 
of the locations in the two subarrays 340, 342, and in 

15 other subarrays, except at the edges of the arrays. In 
a process similar to that described above in reference 
to Fig. 13, for example, the local average value at the 
location (3, 3) may be derived by adding the data 
samples that are in the first row through the fourth row 

20 and in the first column through the fourth column, or a 
total of sixteen data samples, and dividing the sum by 
16 in the equation given below: 



(8) 



T ( 3, ) =i/(i6)»[r (U)+ r a2)+ T a3)+ r (M)+ T (2 , I)+ r (2f , + T (2 ,3 )+ 

T (2,4) +T (3,l) + r (3 f 2) + r (3,3) + T (3,4) +T (4,1 ) + r (4 t 2) +T (4,3) +T (4 ( 4)] 



for the target subarray. The local average value for 
25 the location (3, 3) is labelled T (33i and similar local 
averages may be derived at other locations in the 
subarray for both the target subarray and the reference 
subarray. The arrays of local average values derived in 
this process are shown in Figs. 14B and 14D. 
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Preferably, the subarrays in Figs. 14A and 14C are large 
enough so that the edge effects do not significantly 
affect the accuracy of the misregistration correction 
process; for this reason, the subarrays are preferably 
5 much larger than as shown in Figs. 14A, 14C. 

As shown in Fig. 15, the data samples for the 
target subarray and for the reference subarray are fed 
respectively to inputs 352, 354 and are stored 
respectively in memory buffers 356, 358. Local averages 
10 are generated in the manner described above in reference 
to Figs. 13 and 14A-14D by blocks 362, 364. Then, the 
data sample at each location in the subarray is divided 
by the local average value derived for such location by 
blocks 366, 368 to obtain a normalized subarray of 
15 target subarray data samples at the output of block 366 
and a subarray of normalized reference data samples at 
the output of block 368. In the preferred embodiment 
shown in Figs. 14A f 14C, subarray 342 of the reference 
array corresponds to the same resolution element 
20 locations as target subarray 340 and defines a reference 
position of the reference subarray; it will be 
understood that other reference subarrays may also be 
used to be the reference position for the reference 
position. The remaining reference subarrays that are to 
25 be compared to the target subarray are then offset 
relative to the reference position by at least one row 
and/or column. 

in the same manner to that described above by 
reference to Figs. 11 and 12, in the preferred 
30 embodiment, if there is to be no offset between the data 
samples of a reference subarray and the reference 
position, blocks 372, 374 would introduce the same 
amount of delay. Thus, for target subarray 340, 
reference subarray 342 comprising data samples at 
35 corresponding locations in the reference array to the 
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target array preferably defines the reference position, 
in the event that a subarray other than subarray 342 is 
used for comparison (i.e. a reference subarray that is 
offset from the reference position), then blocks 372, 
374 introduce different relative delays between target 
subarray 340 and such selected different reference 
subarray. The subtract block 376 would subtract from 
each data sample in the normalized target subarray the 
data sample at the corresponding location from the 
selected normalized reference subarray to obtain a 
difference value which is supplied to block 378. Block 
378 stores such difference values for each of the 
locations of the target subarray and computes a sum of 
the absolute values of the differences from the output 
of block 376. This sum is then supplied to block 380 • 
The above process is then repeated where there is 
a different offset between the reference subarray and 
the reference position by one or more columns in the Y 
direction, where the offset m is the offset of the 
number of columns. As in the case of the reference 
vector comparison, the range of value for m in this 
context is also chosen to be the maximum expected range 
of misregistration errors. Thus, if the misregistration 
error in the Y direction is not expected to exceed two 
resolution elements in the -Y or +Y directions, then the 
value of m should be chosen to range from -2 to +2. In 
such event, the delays introduced by blocks 374, 372 are 
chosen such that the block 378 computes the sums by 
reference to a reference subarray that is offset from 
the reference position by one of the five relative 
offset values m in the ±Y direction. 

Different from the case of the current and 
reference vectors, the reference subarrays may also be 
misregistered relative to the reference position in the 
X direction so that another offset variable n is used to 
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track the offset in the X direction. In such event, a 
number of reference subarrays, each offset from the 
reference position by at least one row (n) and/or column 
(m) , may be compared to the target subarray to correct 

5 for misregistration errors. Thus if both m and n are - 
1, the subarray 340 will be compared to the subarray 344 
in Fig. 14C. In the same vein, if misregistration 
errors in the X direction are not expected to exceed one 
resolution elements in the -x and +x directions, then n 

10 may take on one of the three values: -1, 0, +l. Again, 
the delays introduced by blocks 372, 374 are adjusted to 
achieve such relative offsets. Thus, if both m and n 
are expected to range from -1 to +1, the above described 
process for deriving the sum of absolute values of the 

15 differences would be repeated 9 times corresponding to 
9 combinations of different values (3 each) of m and n. 
Such 9 sums are then supplied to block 380 which detects 
which of the 9 values is minimum. Such minimum residual 
value and the offset values for m and n of the reference 

20 subarray that give rise to such minimum residual value 
would indicate the best match between the reference and 
target subarrays. In order to correct the 

misregistration errors, such offset values for m and n 
are then supplied to variable delay block 382 which 

25 delays the normalized target subarray by the appropriate 
amount so as to correct for misregistration errors. 

In the preferred embodiment, in a normalization 
process in blocks 362, 366 and 364, 368, it may be 
desirable to first subtract from each data sample in the 

30 target subarray and reference subarray a certain 
percentage of the local average value to obtain a 
difference and then dividing such difference by the 
average value to obtain a normalized data sample value. 
The normalization processes above may be performed by 
35 the following equations: 
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(9) 



R. . - a D R.. 



R 



R 



(10) 



T. . - a T T.. 



T. . 



(ii) 



O < a R < 1 



(12) 



0<a r <l 

5 where R N i,j, T^hj are the normalized values respectively 
of the data samples at the ith row and jth column of the 
reference and target subarrays, R tt) , T iti are the values 
respectively of the data samples at the ith row and jth 
column of the reference and target subarrays, cc T a R being 

10 weighting factors, and , T t) are local averaged values 
of the data samples at the ith row and jth column over 
the reference and target subarrays respectively that are 
given by: 
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where K is size of an averaging window. 

The operation carried out by blocks 376, 378 are 
given by: 



(15) 



M-l N-l 

H = E 2 |T N .. + w . 

i=0 /= 0 



where Mis the number of data samples in each column of 
the reference and target subarrays; 

N is the number of data samples in each row of the 
10 reference and target subarrays; 

m is the number of columns by which the reference 
subarray is offset from the reference position, where m 
can be zero; 
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n is the number of rows by which the reference 
subarray is offset from the reference position, where n 
can be zero; and 

H(n,m) is the residual value corresponding to a 
5 reference subarray that is shifted by n rows and m 
columns from the target subarray. 

The effect of normalization illustrated in Figs. 13 
and 14A-14D are illustrated in Figs. 16A and 16B. Fig. 
16A is a graphical plot of an unnormalized and a 
10 normalized data vector A obtained from a surface of 
object 40 of Fig. 2 inspected having a relatively 
uniform background reflectivity. Fig. 16B is a 
graphical plot of a normalized and an unnormalized data 
sample vector B of a surface of object 40 inspected 
15 where the background reflectivity varies across a 
portion of the surface. In Figs. 16A and 16B the 
unnormalized vectors A, B are shown in solid lines 388, 
390 and the normalized vectors are shown as dotted lines 
3 92, 394. As shown in Figs. 16A and 16B, where the 
20 reflectivity of the surface varies across the surface, 
the normalization process will reduce errors caused by 
such variations. 

Fig. 16C is a graphical plot of the normalized sums 
of residuals versus one of the offsets n, m. By 
25 determining the offsets n, m that give rise to a minimum 
residual value from the plot, misregistration errors can 
be reduced or eliminated. Fig. 16C illustrates the 
situation where the residual value is minimum with m or 
n being zero. 

30 Figs. 16D and 16E are respectively, computer plots 

of the data samples in a reference array and those in a 
target array. As can be seen from the two plots, both 
arrays contain scattering by patterns on the surface. 
Fig. 16F is a computer plot of residual data samples in 

35 the target array and Fig. 16G is a computer plot of the 
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data samples from Fig. 16F but after such data samples 
have been repositioned for misregistration correction. 
Fig. 17 is a block diagram for implementing 
normalized crosscorrelation of a target subarray with a 
5 reference subarray to illustrate another aspect of the 
invention. The operation in Fig. 17 is analogous to 
that of Fig. 11 for normalized crosscorrelation of a 
current vector with a reference vector , except that in 
Fig. 17 , system 400 operates on a target subarray 
10 instead of a current vector and on a reference subarray 
instead of a reference vector. Thus, buffers 406, 408 
would store respectively the target array and subarray 
and reference array and subarray data samples 
respectively. Generators 412, 414 would generate the 
15 average values f t) , T{ } according to equations 13 and 14 
above and blocks 416 and 418 would calculate the 
normalized target and reference subarray values in 
accordance with Equations 9-12 above. The remaining 
blocks of Fig. 17 then perform essentially the same 
20 functions as those of blocks 272-290 of Fig. 11 to 
obtain a plurality of crosscorrelation coefficients 
provided to block 442 , where each coefficient 
corresponds to a different pair of values n, m. Block 
442 then selects the maximum crosscorrelation 
25 coefficient and identifies the pair of values of n, m, 
that gives rise to such maximum crosscorrelation 
coefficient, in response to such pair of values, block 
444 then derives a control signal for controlling the 
amount of delay to be applied by block 446 in order to 
30 reposition the data samples in the target subarray where 
necessary to correct for misregistration errors. 

The operation performed by blocks 426 through 440 
are illustrated by equation 16 below which is analogous 
to the equation 7 above: 
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Where m is the number of columns by which the 
reference subarray is offset from the reference 
position, where mean be zero; 

5 n is the number of rows by which the reference 

subarray is offset from the reference position, where n 
can be zero; and 

W is the crosscorrelation window in the ±Y 
direction and V is the crosscorrelation window in the ±X 

10 direction. 

Again, the number of data samples in each row or 
column of an array may be in the range of 512 to 8192 
and the crosscorrelation windows W, V may be in the 
range of about 4 to 64. 

15 In some applications, it may be desirable to 

process more data samples than are actually acquired. 
Additional data can be produced by a process of 
interpolation. This is illustrated in Fig. 18. Thus, 
as a part of the analog board 134 of Fig. 7 or a part of 

20 the alignment board 138 of the same figure, a two- 
dimensional interpolation function may be employed for 
deriving a larger output data sample array than the 
input array. If the input array has M rows and N 
columns of data samples, the output of block 450 may 
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contain KM rows and LN columns of data samples, where K, 
L are positive integers. The data samples in the output 
array of block 450 are then input to the system shown in 
Figs. 11, 12, 15 and 17. 

5 Adaptive illumination 

In an in-line high speed wafer inspection tool, an 
important factor which limits the detection sensitivity 
of the instrument is the background scattering from 
patterns that are present on the wafer surface. This 
10 background can vary over four orders of magnitude and it 
is in the presence of this background that the 
relatively small signals from anomalies of interest have 
to be detected. Despite careful selection of the 
illumination/collection polarization as well as spatial 
15 filtering to reduce the background, such background can 
still vary over too large a dynamic range, requiring 
complex and expensive electronics and data processing to 
extract the anomalies of interest. Thus, another aspect 
of the invention is directed to the observation that the 
20 dynamic range of background signal can be much reduced 
by adaptively modulating the intensity of the 
illuminating beam in response to the output of a 
detector detecting the specular reflection or scattering 
from the surface. 
25 Fig. 19 is a system diagram illustrating a portion 

of system 20 of Fig. 2 to illustrate the aspect of the 
invention on adaptively modulating the intensity of the 
illuminating beam. As shown in Figs. 2 and 19, detector 
90 detects the specular reflection of beam 38 from 
30 surface 40. An integrator 502 integrates the intensity 
signal across a line of sweep 50 across the surface and 
peak detector 504 detects the peak intensity* detected 
during such sweep. The integrated intensity and the 
peak intensity are converted to digital signals by 
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converter 506 and supplied to a digital signal processor 
510. In response to the integrated intensity from 
integrator 502 and the peak intensity from 504, the 
digital signal processor supplies a control signal for 
5 controlling the attenuation to be applied by attenuator 
512. This output of the processor 510 is converted by 
digital to analog convertor 514 before the signal is 
applied to attenuator 512. Attenuator 512 attenuates 
the output of chirp generator 520 before the generator 
10 output is applied to amplifier 82 which, in turn, powers 
the AOD 30. By reducing the voltage that is applied to 
the AOD 30 when the specular reflection intensity 
detected by detector 90 is high and increasing such 
voltage when the detected intensity is low, electronics 
15 92 reduces the dynamic range of the background signal 
sensed by detector 90 and enhances the sensitivity of 
the technique for detecting anomalies described above. 

The digital signal processor 510 may store the 
integrated intensity value from integrator 502 and/or 
20 the peak intensity value from detector 504 from a prior 
scan line for modulating the intensity of beam 38 for 
scanning a subsequent scan line. The digital signal 
processor 510 may also store other reference data (e.g. 
average intensity values for a number of prior scans) 
25 for comparison with the integrated value from integrator 
50 2 and/or the peak value from detector 504 for 
generating the control signal to attenuator 512. While 
in the preferred embodiment, an AOD is used for 
generating the scanning beam, the above described 
30 observation applies to inspection systems employing 
other means for generating the scanning optical beam. 
While in the preferred embodiment, both the integrated 
and peak intensities are employed for generating the 
control signal for modifying the intensity of beam 38, 
35 the use of only one of the two (the peak intensity and 



WO 98/45685 



PCTYUS98/06929 



52 

the integrated intensity) may be adequate for some 
applications. Instead of using the peak intensity or 
the integrated intensity, an average intensity may also 
be used in the place of or in addition to the other two 
5 parameters. 

Fig. 2 0 is a block diagram of a system for 
correcting misregistration errors by means of residual 
minimization and adaptive illumination. Where the 
variations in reflectivity of the surface to be 
10 inspected is either insignificant or where such 
variations have been compensated for by adaptive 
illumination in the manner described above in reference 
to Fig. 19, the data samples may no longer need to be 
first normalized in the manner described above in 
15 reference to Fig. 15. In such event, the system of Fig. 
15 may be simplified by removing the functional blocks 
362-368 related to normalization. Similarly, the 
functional blocks 262, 264, 266, 268 of Fig. 11, blocks 
312-318 of Fig. 12 and blocks 412-418 of Fig. 17 may 
20 also be omitted in the same way in such circumstances. 

While the invention has been described above by 
reference to different embodiments, it will be 
understood that different modifications and changes may 
be made without departing from the scope of the 
25 invention which is to be defined only by the appended 
claims and their equivalents. 
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WHAT IS CLAIMED IS : 

1. A method for correcting misregistration errors 

in a system for detecting anomalies in a specimen such 

as a semiconductor wafer, said system illuminating a 
5 surface of the specimen at a two-dimensional array of 

locations in rows and columns; said method comprising 

the steps of: 

generating a two-dimensional array of data samples 

in rows and columns, each data sample representing light 
10 modified by the specimen at a corresponding location in 

the two-dimensional array of locations of the sample; 
defining one-dimensional groups of data samples, 

and each group of data samples or portion thereof 

defining a vector; 
15 for at least one vector, providing a reference 

vector that is an average of selected vectors of data 

samples ; and 

processing the at least one vector and the 
reference vector to correct for misregistration. 

20 2. The method of claim 1, said defining step 

defining each group of data samples to be a row of data 
samples. 

3. The method of claim 2, said providing step 
providing said reference vector by computing a weighted 

25 average of said selected vectors. 

4. The method of claim 2, said processing step 
being performed on data samples within a window along 
said at least one vector, said window containing the 
same or fewer number of data samples than the at least 

30 one vector. 
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5. The method of claim 4, said at least one 
vector having a number of data samples, said number 
being in the range of about 512 to 8192. 

6. The method of claim 1, said processing step 
5 comprising processing different sets of data samples of 

said reference vector, or signals derived therefrom, 
wherein at least some data samples in such sets are 
offset from the data samples of the at least one vector. 

7. The method of claim 6, said defining step 
10 defining each group of data samples to be a row of data 

samples, wherein at least one of the sets of data 
samples of the reference vector is offset by one or more 
columns relative to the data samples in the at least one 
vector. 

15 8. The method of claim 6, wherein the processing 

step comprises crosscorrelating each set of data samples 
of said reference vector with said at least one vector 
to obtain a crosscorrelation coefficient corresponding 
to each of the sets. 

20 9. The method of claim 8, wherein the processing 

step derives a maximum value of the correlation 
coefficient for each data sample in the at least one 
vector. 

10. The method of claim 9, further comprising re- 
25 positioning each of at least some data samples of the at 
least one vector so that it is identified by the 
location, in the two-dimensional array of data samples, 
of the corresponding data sample in the set of data 
samples in the reference vector for which the 
30 corresponding crosscorrelation coefficient is maximum. 
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11. The method of claim 6, wherein the processing 
step comprises computing a residual value between each 
set of data samples of said reference vector and said at 
least one vector. 

5 12. The method of claim 11, further comprising 

comparing the residual values to obtain an offset value 
corresponding to a minimum residual value. 

13. The method of claim 11, further comprising 
normalizing the data samples of said reference vector 
10 and of the at least one vector prior to the computing 
step, wherein the computing step computes a sum of 
differences between data samples of each of said sets of 
said normalized reference vector and the data samples of 
said normalized at least one vector. 

15 14. The method of claim 2, said obtaining step 

deriving the reference vector from selected vectors 
comprising data samples that are in different rows but 
the same columns in the two-dimensional array of data 
samples . 

20 15. The method of claim 14, said providing step 

providing a reference vector G Uj given by: 



H/2-l 



? =-H/2 



1 



where Y Kf is the data sample value at the ith row, jth 
column of the two-dimensional array; 
D are weighting coefficients; 
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H is the number of said selected vectors for 
generation of the reference vector; 

G 4/ is the value of the reference vector at the ith 
row jth column of the two-dimensional array; 

where /ranges from (N/2-1) to -N/2 so that the at 
least one vector, reference vector and selected vectors 
each has N data samples. 

16. The method of claim 15, said processing step 
including crosscorrelating selected data samples of the 
at least one vector with selected sets of data samples 
of the reference vector according to the following 
equations : 

i A/2-l 

A p = -A/2 '' 0+P) 



A/2-1 

G,.m ± E G.,.„. 



where A is a size of a window within which averaging is 
performed; and Y q , are the averaged values of \ f , 
within the window of width A; 



W/2 -1 



/ =- Wy 2 



W/2 -I _ W/2 -1 _ 



l=-W/2 
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where C^fm) is a crosscorrelation coefficient at the ith 
row and jth column when the reference vector is in the 
same row but offset by m data samples relative to the at 
least one vector; and 
5 W is a size of a correlation window for the 

crosscorrelation coefficient. 

17. The method of claim 16, wherein W includes a 
number of data samples, said number being in the range 
of about 4 to 64. 

10 18. The method of claim 2, said generating step 

generating sequentially rows of said data samples and 
the at least one vector and the reference vector, 
wherein said providing step comprises causing a relative 
time delay between a set of data samples of the 

15 reference vector and selected data samples of the at 
least one vector. 

19. The method of claim 2 r further comprising 
repeating the providing and processing steps for a 
plurality of different vectors, in the two-dimensional 

20 array. 

20. The method of claim 19, wherein at least two 
of the plurality of vectors are in the same row and are 
each shorter than the row. 

21. The method of claim 19, wherein the processing 
25 step re-positions at least some data samples in each of 

at least some of the plurality of vectors in the two- 
dimensional array so that it is identified by the 
location, in the two-dimensional array of data samples, 
of a corresponding set of data samples in the reference 
30 vector, and derives at least one reference vector from 
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one or more re-positioned vectors when the providing and 
processing steps are repeated in a recursive operation. 

22. The method of claim 19, wherein the processing 
step re-positions at least some data samples in each of 

5 at least some of the plurality of vectors in the two- 
dimensional array so that it is identified by the 
location, in the two-dimensional array of data samples, 
of a corresponding set of data samples of the reference 
vector, wherein the processing step derives a reference 

10 vector from one or more vectors that have not been re- 
positioned when the providing and processing steps are 
repeated in a non-recursive operation. 

23. The method of claim 1, wherein the processing 
step compares at least some data samples of the at least 

15 one vector and those of the reference vector to select 
data samples of the reference vector that best match the 
those of the at least some data samples of the at least 
one vector according to a criterion and the offset there 
between, said method further comprising re-positioning 

20 said at least some data samples in the at least one 
vector according to said offset. 

24. The method of claim 23, further comprising 
repeating the providing, processing and re-positioning 
steps for a plurality of different vectors in the two- 

25 dimensional array. 

25. The method of claim 2, said system 
illuminating the surface of the specimen by scanning a 
light beam along the rows of locations, wherein said 
generating step generates the data samples by detecting 

30 light scattered or reflected by the surface. 
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26 . The method of claim 25 , said method further 
comprising detecting the light scattered or reflected by 
the surface and modifying intensity of the light beam in 
response to the scattered or reflected light that is 
5 detected. 

27. The method of claim 25, said detecting step 
including increasing contrast from any pattern on the 
surface of the specimen. 

2B. The method of claim 25 , said light beam having 
10 a point spread function, wherein said providing step 
provides said reference vector by computing a weighted 
average of vectors with weights that are functions of 
the point spread function of the light beam. 

29. The method of claim 2, further comprising 
15 collecting intensity values related to light modified by 
the specimen at a plurality of sites, wherein said 
generating step includes interpolating said intensity 
values to obtain said two-dimensional array of data 
samples . 

20 30. A method for correcting misregistration errors 

in a system for detecting anomalies in a specimen such 
as a semiconductor wafer, said system illuminating a 
surface of the specimen at a two-dimensional array of 
locations in rows and columns; said method comprising 

25 the steps of: 

generating a two-dimensional array of data samples 
in rows and columns, each data sample in a row and 
column representing light modified by the specimen at a 
corresponding location in the two-dimensional array of 

30 locations ; 
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the data samples of each of a plurality of reference 
subarrays, or between signals derived therefrom, the 
target and the reference subarrays having the same 

5 dimensions, one of the reference subarrays being at a 
reference position and the remaining reference subarrays 
being offset from the reference position by at least one 
row and/or column of data samples to select a pair of 
offset values; and 

10 repositioning the target subarray according to the 

pair of offset values. 

31. The method of claim 30, said comparing step 
including computing a residual value between the data 
samples of a target subarray and the data samples of 

15 each of a plurality of reference subarrays, or between 
signals derived therefrom, to obtain a plurality of 
residual values each corresponding to one of said 
plurality of reference subarrays. 

32. The method of claim 31, said comparing step 
20 including determining a pair of offset values 

corresponding to a minimum residual value, said re- 
positioning step repositioning the target subarray 
according to such pair of offset values. 

33. The method of claim 31, said computing step 
25 also including deriving a normalized target subarray 

from said target subarray and deriving a plurality of 
normalized reference subarrays from said plurality of 
reference subarrays, wherein the residual values are 
obtained by calculating the residual value between the 
30 normalized target subarray and each of the plurality of 
normalized reference subarrays. 
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34. The method of claim 33, said deriving step 
including calculating, for each of at least some of the 
data samples in the target subarray and/or reference 
subarrays, a corresponding average value of data 

5 samples. 

35. The method of claim 34, said deriving step 
including dividing each of at least some of such data 
samples or signals derived therefrom in a target or 
reference subarray by its corresponding average value to 

10 obtain said normalized target or reference subarray. 

36. The method of claim 35, said deriving step 
including subtracting from each of at least some of such 
data samples in a target or reference subarray a 
fraction of its corresponding average value to obtain a 

15 difference and dividing such difference by its 
corresponding average value to obtain said normalized 
target or reference subarray. 

37. The method of claim 35, said calculating step 
calculating the corresponding average value of data 

20 samples that is a weighted average of data samples. 

38. The method of claim 37, wherein said 
calculating step calculates, for each of at least some 
of the data samples in the target subarray and/or 
reference subarrays, a corresponding average value of 

25 data samples in a number of rows and a number of columns 
from such data sample. 

39. The method of claim 38, said system 
illuminating the surface of the specimen by scanning a 
light beam along the rows of locations, said light beam 

30 characterized by a point spread function having a 
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lateral extent, said data samples in said number of rows 
and said number of columns from which said 
corresponding average value is calculated are within the 
lateral extent of the point spread function of the light 
5 beam. 

40. The method of claim 35, said deriving step 
deriving the normalized target subarray from said target 
subarray and the plurality of normalized reference 
10 subarrays from said plurality of reference subarrays 
according to the following equations: 

R. . - a Q R. . 



*>1 



R . 



T. . - a T T. . 



T. . 



O < < 1 



0<a T < 1 



Where T N if are the normalized values respectively 

of the data samples at the ith row and jth column of the 
reference and target subarrays, R Kf , T i} are the values 
15 respectively of the data samples at the ith row and jth 
column of the reference and target subarrays, cc T a R being 
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weighting factors, and 1^ , T t f are local average values 
of the data samples over the reference and target 
subarrays respectively that are given by: 



R = 



1 

K 1 
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where ICis size of an averaging window. 

41. The method of claim 40, wherein said residual 
value is computed according to the following equation: 



M-l N-l 

H = S 2 IT" w . ^-^ N ..| 

n,m 1 0+n),(/ + w) »■/ 1 

f=0 7=0 



where Mis the number of data samples in each column of 
the reference and target subarrays; 

N is the number of data samples in each row of the 
10 reference and target subarrays; 

m is the number of columns by which the reference 
subarray is offset from the reference position, where m 
can be zero; 
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n is the number of rows by which the reference 
subarray is offset from the reference position, where n 
can be zero; and 

H(n,m) is the residual value corresponding to a 
5 reference subarray that is shifted by n rows and m 
columns from the reference position. 

42. The method of claim 30, further comprising 
collecting intensity values related to light modified by 
the specimen at a plurality of sites, wherein said 

10 generating step includes interpolating said intensity 
values to obtain said two-dimensional array of data 
samples . 

43. A method for correcting misregistration errors 
in a system for detecting anomalies in a specimen such 

15 as a semiconductor wafer, said system illuminating a 
surface of the specimen at a two-dimensional array of 
locations in rows and columns; said method comprising 
the steps of: 

generating a two-dimensional array of data samples 
20 in rows and columns, each data sample in a row and 
column representing light modified by the specimen at a 
corresponding location in a row and column of locations; 

crosscorrelating a target subarray and each of a 
plurality of reference subarrays of data samples , or 
25 signals derived therefrom, to obtain a plurality of sets 
of crosscorrelation values, the target and the reference 
subarrays having the same dimensions, one of the 
reference subarrays being at a reference position and 
the remaining reference subarrays being offset from the 
30 reference position by at least one row and/or column of 
data samples; and 
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selecting the reference subarray that corresponds 
to a set of crosscorrelation values according to a 
criterion. 

44. The method of claim 43, further comprising 
5 deriving a normalized target subarray from said target 

subarray and deriving a plurality of normalized 
reference subarrays from said plurality of reference 
subarrays, wherein the crosscorrelating step 
crosscorrelates the normalized target subarray and each 
10 of the plurality of normalized reference subarrays. 

45. The method of claim 44, said deriving step 
including calculating, for each of at least some of the 
data samples in the target subarray and/or reference 
subarrays, a corresponding average value of data 

15 samples. 

46. The method of claim 45, said deriving step 
including dividing each of at least some of such data 
samples or signals derived therefrom in a target or 
reference subarray by its corresponding average value to 

20 obtain said normalized target or reference subarray. 

47. The method of claim 46, said deriving step 
including subtracting from each of at least some of such 
data samples in a target or reference subarray a 
fraction of its corresponding average value to obtain a 

25 difference and dividing such difference by its 
corresponding average value to obtain said normalized 
target or reference subarray. 

48. The method of claim 46, said calculating step 
calculating the corresponding average value of data 

30 samples that is a weighted average of data samples. 
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49. The method of claim 48, wherein said 
calculating step calculates, for each of at least some 
of the data samples in the target subarray and/or 
reference subarrays, a corresponding average value of 
5 data samples in a number of rows and a number of columns 
from such data sample. 



50. The method of claim 49, said system 
illuminating the surface of the specimen by scanning a 
light beam across the rows of locations in a raster 

10 pattern, said light beam characterized by a point spread 
function having a lateral extent f said data samples in 
said number of rows and said number of columns from 
which said corresponding average value is calculated are 
within the lateral extent of the point spread function 

15 of the light beam, 

51. The method of claim 47, said deriving step 
deriving the normalized target subarray from said target 
subarray and the plurality of normalized reference 

20 subarrays from said plurality of reference subarrays 
according to the following equations: 



R N .. = 
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0<a T < 1 



0<a R <l 



where R N U , T^, are the normalized values respectively 
of the data samples at the ith row and jth column of the 
reference and target subarrays, are the values 

respectively of the data samples at the ith row and jth 
column of the reference and target subarrays, a T a R 
being weighting factors, and \ )9 f i} are local average 
values for the ith row and jth column of the data 
samples over the reference and target subarrays 
respectively that are given by: 
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10 where ICis size of an averaging window. 
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52. The method of claim 51, wherein said 
crosscorrelation values are computed according to the 
following equation: 



C u (n f m) = 
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where W is the crosscorrelation window in the ±Y 
5 direction and V is the crosscorrelation window in the ±X 
direction ; 

where m is the number of columns offset from the 
reference position; 

n is the number of rows offset from the reference 
10 position; and 

C u {n,m) is a correlation coefficient at data sample 
position (i,j) at offsets n, nu 

53. The method of claim 43 f said selecting step 
selecting the reference subarray that corresponds to a 

15 maximum set of crosscorrelation values. 

54. The method of claim 43, further comprising 
collecting intensity values related to light modified by 
the specimen at a plurality of sites, wherein said 
generating step includes interpolating said intensity 

20 values to obtain said two-dimensional array of data 
samples . 
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55. A method for detecting anomalies in the 
specimen such as a semiconductor wafer, said method 
comprising the steps of: 

scanning a light beam across the specimen along 
5 scan lines; 

detecting light originating from the light beam 
after such light has been modified by the specimen for 
detecting anomalies in the specimen; and 

controlling intensity of the light beam as a 
10 function of reference data to correct for variations in 
the detected light caused by optical characteristics of 
the specimen apart from the anomalies. 

56. The method of claim 55, further comprising 
storing signals representative of light that is detected 

15 in the detecting step, said controlling step controlling 
intensity of the light beam as a function of said stored 
signals. 

57. The method of claim 55, said detecting step 
detecting a peak value and/or an integrated or average 

20 value of the intensity of the light modified by the 
specimen along a first scan line, said controlling step 
controlling intensity of the light beam in response to 
light detected in the detecting step for scanning a 
second scan line according to the peak value and/or the 

25 integrated or average value of the intensity of the 
light modified by the specimen along the first scan 
line. 

58. The method of claim 55, said scanning step 
employing an acous to-optic deflector, said controlling 

30 step applying to the acousto-optic deflector a signal 
that is a function of the data. 
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59 . An apparatus for detecting anomalies in the 
specimen such as a semiconductor wafer, comprising: 

means for scanning a light beam across the specimen 

along scan lines; 
5 a detector device detecting light originating from 

the light beam after such light has been modified by the 

specimen for detecting anomalies in the specimen; and 
means for controlling intensity of the light beam 

as a function of reference data to correct for 
10 variations in the detected light caused by optical 

characteristics of the specimen apart from the 

anomalies. 

60. The apparatus of claim 59 , further comprising 
a storage storing signals representative of light that 

15 is detected by the detector device, said controlling 
means controlling intensity of the light beam as a 
function of said stored signals. 

61. The apparatus of claim 59, said detector 
device detecting a peak value and/or an integrated or 

20 average value of the intensity of the light modified by 
the specimen along a first scan line, said controlling 
means controlling intensity of the light beam in 
response to light detected by the detector device for 
scanning a second scan line according to the peak value 

25 and/or the integrated or average value of the intensity 
of the light modified by the specimen along the first 
scan line. 

62. The apparatus of claim 59, said scanning means 
including an acousto-optic deflector, said controlling 

30 means applying to the acousto-optic deflector a signal 
that is a function of the data. 
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63. An apparatus for correcting misregistration 
errors in a system for detecting anomalies in a specimen 
such as a semiconductor wafer, said system illuminating 
a surface of the specimen at a two-dimensional array of 
5 locations in rows and columns; said apparatus 
comprising : 

means for generating a two-dimensional array of 
data samples in rows and columns, each data sample 
representing light modified by the specimen at a 
10 corresponding location in the two-dimensional array of 
locations of the data sample? 

means for defining one-dimensional groups of data 
samples, each group of data samples or portion thereof 
defining a vector; 
15 means for providing a reference vector that is an 

average of selected vectors of data samples for at least 
one vector in the two-dimensional array; and 

means for processing the at least one vector and 
the reference vector to correct for misregistration. 

20 64. The apparatus of claim 63, said defining means 

comprising means for defining each group of data samples 
to be a row of data samples. 

65. The apparatus of claim 64, wherein said 
generating means generates the rows of data samples 
25 sequentially, said processing means comprising: 

two signal processing paths for processing the rows 
of data samples; and 

time delay means for introducing a time delay 
between the two signal processing paths in order to 
30 cause a time shift between the reference vector and the 
at least one vector. 
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66. The apparatus of claim 64 , said processing 
means processing said at least one vector and a 
plurality of sets of data samples of the reference 
vector to obtain a plurality of offset values, said time 

5 delay means introducing different time delays between 
the two signal processing paths in order to generate 
said plurality of sets of data samples of the reference 
vector. 

67. An apparatus for correcting misregistration 
10 errors in a system for detecting anomalies in a specimen 

such as a semiconductor wafer, said system illuminating 
a surface of the specimen at a two-dimensional array of 
locations in rows and columns; said apparatus 
comprising: 

15 means for generating a two-dimensional array of 

data samples in rows and columns, each data sample in a 
row and column representing light modified by the 
specimen at a corresponding location in the row and 
column of locations; 

20 means for comparing the data samples of a target 

subarray and the data samples of each of a plurality of 
reference subarrays, or between signals derived 
therefrom, the target and the reference subarrays having 
the same dimensions, one of the reference subarrays 

25 being at a reference position and the remaining 
reference subarrays being offset from the reference 
position by at least one row and/or column of data 
samples to select a pair of offset values; and 

means for repositioning the target subarray 

30 according to the pair of offset values. 

68. The apparatus of claim 67, said comparing 
means including means for computing a residual value 
between the data samples of a target subarray and the 
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data samples of each of a plurality of reference 
subarrays, or between signals derived therefrom, to 
obtain a plurality of residual values each corresponding 
to one of said plurality of reference subarrays. 

5 69. The apparatus of claim 68, wherein said 

generating means generates the rows of data samples 
sequentially, said computing means comprising: 

two signal processing paths for processing the rows 
of data samples; and 

10 time delay means for introducing a time delay 

between the two signal processing paths in order to 
generate the reference subarrays. 

70. An apparatus for correcting misregistration 
errors in a system for detecting anomalies in a specimen 
15 such as a semiconductor wafer, said system illuminating 
a surface of the specimen at a two-dimensional array of 
locations in rows and columns; said apparatus 
comprising: 

means for generating a two-dimensional array of 
20 data samples in rows and columns , each data sample in a 
row and column representing light modified by the 
specimen at a corresponding location in a row and column 
of locations; 

means for crosscorrelating a target subarray and 
25 each of a plurality of reference subarrays of data, or 
signals derived therefrom, to obtain a plurality of sets 
of crosscorrelation values, the target and the reference 
subarrays having the same dimensions, one of the 
reference subarrays being at a reference position and 
30 the remaining reference subarrays being offset from the 
reference position by at least one row and/or column of 
data samples; and 
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